NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Gittins Policy for Optimizing Tail Latency

https://doi.org/10.1145/3727109

Harlev, Amit; Yu, George; Scully, Ziv (May 2025, Proceedings of the ACM on Measurement and Analysis of Computing Systems)

We consider the problem of scheduling to minimize asymptotic tail latency in an M/G/1 queue with unknown job sizes. When the job size distribution is heavy-tailed, numerous policies that do not require job size information (e.g. Processor Sharing, Least Attained Service) are known to be strongly tail optimal, meaning that their response time tail has the fastest possible asymptotic decay. In contrast, for light-tailed size distributions, only in the last few years have policies been developed that outperform simple First-Come First-Served (FCFS). The most recent of these is γ-Boost, which achieves strong tail optimality in the light-tailed setting. But thus far, all policies that outperform FCFS in the light-tailed setting, including γ-Boost, require known job sizes. In this paper, we design a new scheduling policy that achieves strong tail optimality in the light-tailed M/G/1 with unknown job sizes. Surprisingly, the optimal policy turns out to be a variant of the Gittins policy, but with a novel and unusual feature: it uses a negative discount rate. Our work also applies to systems with partial information about job sizes, covering γ-Boost as an extreme case when job sizes are in fact fully known.
more » « less
Free, publicly-accessible full text available May 27, 2026
Cost-aware Bayesian optimization via the Pandora's box Gittins index

Xie, Qian; Astudillo, Raul; Frazier, Peter; Scully, Ziv; Terenin, Alexander (December 2024, Curran Associates, Inc.)

Full Text Available
Transform Analysis of Preemption Overhead in the M/G/1

https://doi.org/10.1145/3695411.3695419

Ramakrishna, Shefali; Scully, Ziv (September 2024, ACM SIGMETRICS Performance Evaluation Review)

Preemptive scheduling policies, which allow pausing jobs mid-service, are ubiquitous because they allow important jobs to receive service ahead of unimportant jobs that would otherwise delay their completion. The canonical example is Shortest Remaining Processing Time (SRPT), which preemptively serves the job with least remaining work at every moment in time [9]. There is a robust literature analyzing response time (elapsed time between a job's arrival and completion) in the M/G/1 queue under many preemptive policies [6, 10, 11], shedding light on questions such as how preemption affects the mean and tail of response time, and whether preemption is unfair towards low-priority jobs.
more » « less
Full Text Available
A Gittins Policy for Optimizing Tail Latency

https://doi.org/10.1145/3695411.3695418

Harlev, Amit; Yu, George; Scully, Ziv (September 2024, ACM SIGMETRICS Performance Evaluation Review)

Service level objectives (SLOs) for queueing systems typically relate to the tail of the system's response time distribution T. The tail is the function mapping a time t to the probability P[T > t]. SLOs typically ask that high percentiles of T are not too large, i.e. that P[T > t] is small for large t.
more » « less
Full Text Available
Strongly Tail-Optimal Scheduling in the Light-Tailed M/G/1

https://doi.org/10.1145/3656011

Yu, George; Scully, Ziv (May 2024, Proceedings of the ACM on Measurement and Analysis of Computing Systems)

We study the problem of scheduling jobs in a queueing system, specifically an M/G/1 with light-tailed job sizes, to asymptotically optimize the response time tail. This means scheduling to make P[T > t], the chance a job's response time exceeds t, decay as quickly as possible in the t \to \infty limit. For some time, the best known policy was First-Come First-Served (FCFS), which has an asymptotically exponential tail: P[T > t] ~ C e^-γ t . FCFS achieves the optimal decay rate γ, but its tail constant C is suboptimal. Only recently have policies that improve upon FCFS's tail constant been discovered. But it is unknown what the optimal tail constant is, let alone what policy might achieve it. In this paper, we derive a closed-form expression for the optimal tail constant C, and we introduce γ-Boost, a new policy that achieves this optimal tail constant. Roughly speaking, γ-Boost operates similarly to FCFS, but it pretends that small jobs arrive earlier than their true arrival times. This significantly reduces the response time of small jobs without unduly delaying large jobs, improving upon FCFS's tail constant by up to 50% with only moderate job size variability, with even larger improvements for higher variability. While these results are for systems with full job size information, we also introduce and analyze a version of γ-Boost that works in settings with partial job size information, showing it too achieves significant gains over FCFS. Finally, we show via simulation that γ-Boost has excellent practical performance.
more » « less
Full Text Available
When Does the Gittins Policy Have Asymptotically Optimal Response Time Tail in the M/G/1?

https://doi.org/10.1287/opre.2022.0038

Scully, Ziv; van_Kreveld, Lucas (February 2024, Operations Research)

We consider scheduling in the M/G/1 queue with unknown job sizes. It is known that the Gittins policy minimizes mean response time in this setting. However, the behavior of the tail of response time under Gittins is poorly understood, even in the large-response-time limit. Characterizing Gittins’s asymptotic tail behavior is important because if Gittins has optimal tail asymptotics, then it simultaneously provides optimal mean response time and good tail performance. In this work, we give the first comprehensive account of Gittins’s asymptotic tail behavior. For heavy-tailed job sizes, we find that Gittins always has asymptotically optimal tail. The story for light-tailed job sizes is less clear-cut: Gittins’s tail can be optimal, pessimal, or in between. To remedy this, we show that a modification of Gittins avoids pessimal tail behavior, while achieving near-optimal mean response time.
more » « less
Full Text Available
Performance of the Gittins policy in the G/G/1 and G/G/k, with and without setup times

https://doi.org/10.1016/j.peva.2023.102377

Hong, Yige; Scully, Ziv (January 2024, Performance Evaluation)

Full Text Available
Heavy-Traffic Optimal Size- and State-Aware Dispatching

https://doi.org/10.1145/3639035

Xie, Runhan; Grosof, Isaac; Scully, Ziv (February 2024, Proceedings of the ACM on Measurement and Analysis of Computing Systems)

Dispatching systems, where arriving jobs are immediately assigned to one of multiple queues, are ubiquitous in computer systems and service systems. A natural and practically relevant model is one in which each queue serves jobs in FCFS (First-Come First-Served) order. We consider the case where the dispatcher is size-aware, meaning it learns the size (i.e. service time) of each job as it arrives; and state-aware, meaning it always knows the amount of work (i.e. total remaining service time) at each queue. While size- and state-aware dispatching to FCFS queues has been extensively studied, little is known about optimal dispatching for the objective of minimizing mean delay. A major obstacle is that no nontrivial lower bound on mean delay is known, even in heavy traffic (i.e. the limit as load approaches capacity). This makes it difficult to prove that any given policy is optimal, or even heavy-traffic optimal. In this work, we propose the first size- and state-aware dispatching policy that provably minimizes mean delay in heavy traffic. Our policy, called CARD (Controlled Asymmetry Reduces Delay), keeps all but one of the queues short, then routes as few jobs as possible to the one long queue. We prove an upper bound on CARD's mean delay, and we prove the first nontrivial lower bound on the mean delay of any size- and state-aware dispatching policy. Both results apply to any number of servers. Our bounds match in heavy traffic, implying CARD's heavy-traffic optimality. In particular, CARD's heavy-traffic performance improves upon that of LWL (Least Work Left), SITA (Size Interval Task Assignment), and other policies from the literature whose heavy-traffic performance is known.
more » « less
Optimal Scheduling in the Multiserver-job Model under Heavy Traffic

https://doi.org/10.1145/3570612

Grosof, Isaac; Scully, Ziv; Harchol-Balter, Mor; Scheller-Wolf, Alan (December 2022, Proceedings of the ACM on Measurement and Analysis of Computing Systems)

Multiserver-job systems, where jobs require concurrent service at many servers, occur widely in practice. Essentially all of the theoretical work on multiserver-job systems focuses on maximizing utilization, with almost nothing known about mean response time. In simpler settings, such as various known-size single-server-job settings, minimizing mean response time is merely a matter of prioritizing small jobs. However, for the multiserver-job system, prioritizing small jobs is not enough, because we must also ensure servers are not unnecessarily left idle. Thus, minimizing mean response time requires prioritizing small jobs while simultaneously maximizing throughput. Our question is how to achieve these joint objectives. We devise the ServerFilling-SRPT scheduling policy, which is the first policy to minimize mean response time in the multiserver-job model in the heavy traffic limit. In addition to proving this heavy-traffic result, we present empirical evidence that ServerFilling-SRPT outperforms all existing scheduling policies for all loads, with improvements by orders of magnitude at higher loads. Because ServerFilling-SRPT requires knowing job sizes, we also define the ServerFilling-Gittins policy, which is optimal when sizes are unknown or partially known.
more » « less
Full Text Available
The most common queueing theory questions asked by computer systems practitioners

https://doi.org/10.1145/3543146.3543148

Harchol-Balter, Mor; Scully, Ziv (June 2022, ACM SIGMETRICS Performance Evaluation Review)

This document examines five performance questions which are repeatedly asked by practitioners in industry: (i) My system utilization is very low, so why are job delays so high? (ii) What should I do to lower job delays? (iii) How can I favor short jobs if I don't know which jobs are short? (iv) If some jobs are more important than others, how do I negotiate importance versus size? (v) How do answers change when dealing with a closed-loop system, rather than an open system? All these questions have simple answers through queueing theory. This short paper elaborates on the questions and their answers. To keep things readable, our tone is purposely informal throughout. For more formal statements of these questions and answers, please see [14].
more » « less
Full Text Available

« Prev Next »

Search for: All records